Today sees the publication of a thought piece by six authors the first of whom, alphabetically and (in my view anyhow) in intellectual standing, is Adam Bosworth.

The problem they're addressing is how, in the Web Services context, you put together a package of information some of which is XML and some of which is binary (a digital signature, an audio clip, whatever). After a pretty thorough walk-through of the issues, they come to two conclusions:

  • It's better to pack it all into the XML message than use a trick like Multipart-MIME or one of its relatives to wrap it up.
  • They're worried about the memory and processing costs of textifying the binary data to pack it into XML, which you have to do as XML is currently defined.

Their first conclusion is awfully hard to argue with. The second, though, really needs further exploration; they best they can do to quantify their worries is the following bit of hand-waving:

It is well known that base64 encoded data expands by a factor of 1.33x original size, and that hexadecimal encoded data expands by a factor of 2x (assuming an underlying UTF-8 text encoding in both cases; if the underlying text encoding is UTF-16, these numbers double). Also of concern is the overhead in processing costs (both real and perceived) for these formats, especially when decoding back into raw binary. When comparing base64 decoding to a straight-through copy of opaque data, the throughput of at least one popular programming system decreased by a factor of 3 or more.

Without some good hard quantitative evidence, I'd be inclined to argue that we should just use base64 to address this problem until someone proves it's too expensive. 33% size increase seems pretty cheap to me if what it buys you is fitting smoothly into the community of XML tools and expertise; and the cost of that 33% obviously depends on how much of what's being transmitted is binary, for which we currently have no numbers.

Also, on the face of it, I'd think that someone whose code runs three times slower because there's base64 in the loop should be taken out and shot, whether they're using a "popular programming system" or not.

Let's Standardize xml:binary · The problem of wanting to stick binary data into XML is not limited to the SOAP world; in fact it's common enough that it arguably ought to be part of the basic XML processing machinery, so that you shouldn't have to have a schema to use it.

We already have xml:lang, xml:space, and xml:base, as part of the low-level infrastructure. Let's introduce a new attribute xml:binary with possible values (for now) hex and base64, so that you could take any old element and say:

<anyOldElement xml:binary="base64">

It would be easy, cheap, useful and break no existing software.


author · Dad
colophon · rights
picture of the day
February 26, 2003
· Technology (90 fragments)
· · XML (136 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!